arXiv:1509.00500vl [stat.ME] 1 Sep 2015 


Estimation of delta-contaminated density of the 
random intensity of Poisson data 

Daniela De Canditiis, 

Istituto per le Applicazioni del Calcolo ”M. Picone”, CNR Rome, Raly 

Marianna Pensky, 

Department of Mathematics, University of Central Florida 


Abstract 

In the present paper, we constructed an estimator of a delta contaminated mixing density 
function g{X) of the intensity A of the Poisson distribution. The estimator is based on an 
expansion of the continuous portion go (A) of the unknown pdf over an overcomplete dictionary 
with the recovery of the coefficients obtained as solution of an optimization problem with Lasso 
penalty. In order to apply Lasso technique in the, so called, prediction setting where it requires 
virtually no assumptions on dictionary and, moreover, to ensure fast convergence of Lasso 
estimator, we use a novel formulation of the optimization problem based on inversion of the 
dictionary elements. The total estimator of the delta contaminated mixing pdf is obtained 
using a two-stage iterative procedure. 

We formulate conditions on the dictionary and the unknown mixing density that yield a 
sharp oracle inequality for the norm of the difference between go (A) and its estimator and, 
thus, obtain a smaller error than in a minimax setting. Numerical simulations and comparisons 
with the Laguerre functions based estimator recently constructed by Comte and Genon-Catalot 
(2015) also show advantages of our procedure. At last, we apply the technique developed in 
the paper to estimation of a delta contaminated mixing density of the Poisson intensity of the 
Saturn’s rings data. 
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1 Introduction 

Poisson-distributed data appear in many contexts. In the last two decades a large amount of effort 
was spent on recovering the mean function in the Poisson regression model. In this set up, one 
observes independent Poisson variables Yf,--- ,Yn where 1) with respective means Xi = f{i/n), 
i = I,-- - ,n. Here, / is the function of interest which is assumed to exhibit some degree of 
smoothness. The difficulty in estimating / on the basis of Poisson data stems from the fact 
that the variances of the Poisson random variables are equal to their means and, hence, do not 
remain constant as / changes its values. Estimation techniques are either based on variance 
stabilizing transforms (Brown et al. (2010), Fryzlewicz and Nason (2004)), wavelets (Antoniadis 
and Sapatinas (2004), Besbeas et al. (2004), Harmany et al. (2012)), Haar frames (Hirakawa and 
Wolfe (2012)) or Bayesian methods (Kolaczyk (1999) and Timmermann and Nowak (1999)). The 
case of estimating Poisson intensity in the presence of missing data was studied in He et al. (2005). 
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The fact that the variance of a Poisson random variable is equal to its mean serves as a 
common and reliable test that data in question are indeed Poisson distributed. However, in many 
practical situations, although each of the data value Tj ~ Poisson{Xi), i = 1, ■ ■ ■ ,n, the overall 
data do not have Poisson distribution. This is due to the fact that consecutive values of Aj are 
so different from each other that / is not really a function. In this case, in order to account for 
the extra-variance, it is usually reasonable to assume that A itself is a random variable with an 
unknown probability density function g which needs to be estimated. 

In particular, below we consider the following problem. Let Aj, 1 = 1, • • • , n, be independent 
random variables that are not observable and have an unknown pdf g{X). One observes variables 
Yi\Xi ~ Poisson{Xi), i = 1, ■ ■ ■ ,n, that, given Aj, are independent. Our objective is to estimate 
g{X), the so called mixing density, on the basis of observations Yi, • • • ,y„. Here, g can be viewed 
as the prior density of the parameter A, so that the model above reduces to an empirical Bayes 
model where the prior has to be estimated from data. 

Estimation of the prior density of the parameter of the Poisson distribution has been con¬ 
sidered by several authors. For example, Lambert and Tierney (1984) suggested non-parametric 
maximum likelihood estimator, Walter (1985) and Walter and Hamedani (1991) studied estimators 
based on Laguerre polynomials, Zhang (1995) considered smoothing kernel estimators and Hern- 
gartner (1997) investigated Fourier series based estimators of g. All papers listed above provided 
the upper bounds for the mean integrated squared error (MISE); Zhang (1995) and Herngartner 
(1997) also presented the lower bounds for the MISE over smoothness classes. The common feature 
of all these estimators is that the convergence rates are very low. In particular, if n —>■ oo, both 
Zhang (1995) and Herngartner (1997) obtained convergence rates of the form (lnn/lnlnn)“^^ 
where n is the parameter of the smoothness class to which g belongs. The latter seem to imply 
that there is no hope for accurate estimation of the mixing density g unless the sample sizes are 
extremely high. On a more positive note, in a recent paper, Comte and Genon-Catalot (2015) 
considered an estimator of g based on expansion of g over the orthonormal Laguerre basis. They 
showed that if Laguerre coefficients of g decrease exponentially, then the resulting estimator has 
convergence rates that are polynomial in n and provided some examples where this happens. More¬ 
over, they proposed a penalty for controlling the number of terms in the expansion and provided 
oracle inequalities for the estimators of g under various scenarios. 

The low convergence rates for the prior density of Poisson parameter are due to the fact 
that its recovery constitutes a particular case of an ill-posed linear inverse problem. Indeed, let 
L^[0, oo) and be the Hilbert spaces of, respectively, square integrable functions on [0, oo) and 
square integrable sequences. Denote the probability that Y = /, / = 0,1, • • • , by P{1) = P(y = 1). 
Then, introducing a linear operator Q : L^[0,oo) , we can present g{X) as the solution of the 

following equation 

TOO -A 

{Qgm = j^ -^g{X)dX = P{l), / = (1.1) 

Since exact values of the probabilities P{1) are unknown, they can be estimated by relative fre¬ 
quencies ui, so the problem of recovering g appears as an ill-posed linear inverse problem with the 
right-hand side measured with error. Solution of equation dni) is particularly challenging since 
the 5 is a function of a real argument while P is the inhnite-dimensional vector. 

In the last decade, a great deal of effort was spent on recovery of an unknown function 
in regression setting from its noisy observations using overcomplete dictionaries. In particular, 
if the dictionary is large enough and the function of interest has a sparse representation in this 
dictionary, then it can be recovered with a much better precision than when it is expanded over 
an orthonormal basis. Lasso and its versions (see e.g. Biihlmann and van de Geer (2011) and 
references therein) allow one to identify the dictionary elements that guarantee efficient estimation 


2 



of the unknown regression function. The advantage of this approach is that the estimation error 
is controlled by the, so called, oracle inequalities that provide upper bounds for the risk for the 
particular function that is estimated rather than convergence rates designed for the “worst case 
scenario” of the minimax setting. In addition, if the function of interest can be represented via a 
linear combination of just few dictionary elements, then one can prove that it can be estimated 
with nearly parametric error given certain assumptions on the dictionary hold. 

In the present paper, we extend this idea to the case of estimating a mixing density g on 
the basis of Ti,-- - ,1^. However, there is an intrinsic difficulty arising from the fact that the 
problem above is an ill-posed inverse problem. Currently, one can justify convergence of a Lasso 
estimator only if stringent assumptions on the dictionary, the, so called, compatibility conditions, 
are satisfied. In regression set up, as long as compatibility conditions hold, one can prove that 
Lasso estimator is nearly optimal. Regrettably, while compatibility conditions may be satisfied for 
the functions in the original dictionary, they usually do not hold for their images due to contraction 
imposed by the operator Q. In the present paper, we show how to circumvent this difficulty and 
apply Lasso methodology to estimation of g. We formulate conditions on the dictionary and the 
unknown mixing density that yield a sharp oracle inequality for the norm of the difference between 
g{X) and its estimator and, thus, result in a smaller error than in a minimax setting. Numerical 
simulations and comparisons with the Laguerre functions based estimator recently constructed by 
Comte and Genon-Catalot (2015) also show advantages of our procedure. 

Our study is motivated by analysis of the astronomical data, in particular, the photon counts 
Yi,i = 1, ■ ■ ■ , n that come from sets of observations of stellar occultations recorded by the Cassini 
UVIS high speed photometer at different radial points on the Saturn’s ring plane. It is well known 
that Saturn ring is comprised of particles of various sizes, each on its own orbit about Saturn. 
With no outside influences, these photon counts should follow the Poisson distribution, however, 
obstructions imposed by the particles in the ring cause photon counts distribution to deviate from 
Poisson. The latter is due to the fact that although, for each i = I,-- - ,n, the photon counts 
Yi ~ Poisson{\i), the values of A^, i = 1, • • • , n, are extremely varied and, specifically, are best 
described as random variables with the unknown underlying pdf g{X). 

In addition, if a ring region contains a significant proportion of large particles, those particles 
can completely block out the light leading to zero photon counts. For this reason, we assume that 
the unknown pdf g is delta-contaminated, i.e., it is a combination of an unknown mass ttq at zero 
and a continuous part, so that (7(A) can be written as 

5(A) = 7ro(5 (A)-k (1 - 7 ro) 5 o(A) (1.2) 

where goiX) is an unknown pdf and (5(A) is the Dirac delta function such that, for any integrable 
function / one has f f{x)5{x)dx = /(O). Models of the type (jl.2p also appear in other applied 
settings (see, e.g.. Lord et al. (2005)). However, to the best of our knowledge, we are the first ones 
to estimate the delta-contaminated density of the intensity parameter of the Poisson distribution. 
In this setting, we also obtain a sharp oracle inequality for the norm of the difference between 
go{X) and its estimator. We also derive convergence rates for the estimator tto of the mass ttq at 
zero. The estimator has also been successfully applied to recovery of delta-contaminated densities 
of the intensities A for various sub-regions of the Saturn’s rings. 

Finally, we should remark on several other advantages of the approach presented in the paper. 
First, although in the paper we are using the gamma dictionary, the technique can be applied with 
any type of dictionary functions since it is based on a numerical inversion of dictionary elements. 
Moreover, the method can be used even if the underlying conditional distribution is different from 
Poisson. The estimator exhibits no boundary effects and performs well in simulations delivering 
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small errors. Moreover, since we apply Tikhonov regularization for recovering inverse images of 
the dictionary elements, our estimator can be viewed as a version of an elastic net estimator (\27\). 

The rest of the paper is organized as follows. Sections [5] and [3] present, respectively, the 
method and the algorithm for construction of an estimator of the unknown density function, while 
Section 0] studies its convergence properties. Section 0] investigates precision of the estimators de¬ 
veloped in the paper via numerical simulations with synthetic data. Section [6] provides application 
of the technique proposed in the paper to the occultation data for the Saturn’s rings. Finally, 
Section [7] contains the proofs of the statements presented in the paper. 


2 The Lasso estimator of the mixing density 


In what follows, we assume that S'o(-^) in ill-21) can be well approximated by a dictionary that 
consists of gamma pdfs 

M exp(-A/6fc) 

= 7(A;afc,^fc) =- b‘"‘T{ak) -’ 

This is a natural assumption since, for any fixed = h and = 1,2,-•• , the linear span of 
k = 1, - ■ ■, coincides with the space T^(0, oo), so that a linear combination of 4>k, fc = 1, • • • ,p, 
with large p approximates any square integrable function with a small error. Indeed, for a fixed 
bk = b and = 1, 2,3, • • • , this dictionary contains linear combinations of the Laguerre functions 
and, hence, its span approximates L^[0,oo) space. On the other hand, using a variety of scales bk 
allows one to accurately represent a function of interest with many fewer terms. 

Using this dictionary, we estimate g by 

p 

5(A) = vro(5(A)-h (1 - TTo) ^40fc(A), (2.2) 

k=l 


applying a two-step procedure. If the estimator vro were already constructed, coefficients Ck, k = 
1, • • • ,p, could be chosen, so to minimize the squared L^-norm 


ll5-5lli = ll5-^o'5||i + (l-vro)" 


Ck4>k 


k=l 


- 2(1 - TTo) "^Ckig - ^o6,(j)k). 


(2.3) 


k=l 


The hrst term in formula (j2.3p does not depend on coefficients Ck while the second term is com¬ 
pletely known. In order to estimate the last term, note that {g — 'kQ 5 ,(j)k) = {g, 4 >k) — ^o 4 >kiO)- 
Moreover, if we found functions Xk £ such that 


OO _\ ^ 

(Q*Xfc)(A) = Xfc(*) = </>fc(A), VAG(0,-boo), (2.4) 

i=0 


then, it is easy to check that 

r+oo 


^+oo ^ A\2 ^ /•+00 

{gAk) = / 5 (A) ^— T^Xk{'i)d\ = '^Xk{i) i 5 (A)—r 

Jo P- ^r. Jo P- 

OO 

= J;XA:(^)P(^)=IEXfc(n■ 


dX 


i=0 


(2.5) 


i=0 
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Here, P{1) is the marginal probability function 

p 

P{l)=F{Y = l) = 7roI{l = 0) + {l-no)^CkUkil), / = 0,1,2,--- (2.6) 

k=l 

where ][(/ = 0) is the indicator that I = 0. Hence, {g, (pk) can be estimated by 

n oo 

(5r,(/)fc) = k = l,...,p, (2.7) 

i=l 1=0 

where 

n 

n=n-^Y.^{Yi = l), / = 0 , 1 ,--- ( 2 . 8 ) 

i=l 

are the relative frequencies olY = I and 11(H) is the indicator function of a set A. 

There is an obstacle to carrying out estimation above. Indeed, for some values of and 
in formula (12.111 . solutions Xk^X) of equations (j2.4|] may not have finite variances or variances may 
be too high. In order to stabilize the variance we use Tikhonov regularization. In particular, we 
replace solution Xk = {Q*)~^<Pk of equation (12.4p by solution ipk,i^k equation 

{QQ* + CklXkXk = Q4’k, Ck > 0, (2.9) 

where operators Q and Q* are defined in (II.ip and (12.4p . respectively, and I is the identity operator, 
so that, for any /, 

{QQ*f){j) = E p I 2-(^'+'+iV(0, j = 0 , 1 ,... 

1=0 ^ ^ 


Observe that YailtpkXkO^)] ^ decreasing function of Ck while the squared bias — {g, (pkX 

is an increasing function of Ck- Denote Ck the unique solution of the following equation 


1 


-Var[V^,_^~ (y)] = - {g, 


and replace XkX) in i\2.7^ by 


V’fc(y) = V’fc^f^(y) with al = Var[V’fc(y)]- 


( 2 . 10 ) 


( 2 . 11 ) 


In order to identify the correct subset of dictionary functions cpk-, we introduce a weighted 
Lasso penalty. In particular, the vector of coefficients c with components c^, A; = 1, • • • ,p, can be 
recovered as a solution of the following optimization problem 


c = argmin \ (1 — ffo)^ 


'y ^ CkC’k 


k=l 


2(1 -fto) '^Ck [Xk,i^) -^ 0 </>fc( 0 )] +ac'^ak\ck\ 


k=l 


k=l 


Here, Y%=i ^k\ck\ is the weighted Lasso penalty and ac is the penalty parameter. 

Now, consider the problem of estimating the weight ttq when coefficients c^, k = 1 
are known. Denote 


Uk{l) = [ 

Jo 


+0O A 




( 2 . 12 ) 

• • • ,P, 

(2.13) 
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and recall that the marginal probability function is of the form (12.6p . Hence, up to the term that 
does not depend on ttq, given the data vector Y and the vector of coefficients c, the log-likelihood 
of ttq can be written as 


logL(7ro|Y,c) = nz/Q log { vro (1 - tto)C kUk(O) j + n(l - t'o)log(l - vro). (2.14) 


k=l 


If u is a vector with components = C/fc(0), then the expression (|2.14p is maximized by 


vr, 


MLE _ ^0 — C U 
0 


1 — u 


T ’ 

-1 n 


(2.15) 


In order to implement optimization procedure suggested above, consider matrix $ E 
with elements cpi), l,k = 1, ■ ■ ■ ,p, and define vectors z and ^ in with components 

oo n 

Zk = (t>k{^), ik = i'ipk,^) = = ^ipkiYi). (2.16) 

1=0 i=l 

Denote 0 = [1 — t:q)c and re-write optimization problem (12.1211 in terms of vector 6 as 

0 = argmin 1 — 29"^— ttoz ) -|- a'S^ak\0k\ 1 , 

^ I k=l ) 

where the penalty parameter a is related to ac in (I2.12p as Ofc = (1 — fro)«- Introduce matrix W 
such that $ = W^W and vector 


^ = (W^)+(^-7roz) = W(W^ W)-i(^-7roz), 


(2.17) 


where, for any matrix A, matrix is the Moore-Penrose inverse of A. Then, for a given value 
of TTo, optimization problem (I2.12p appears as 


9= argmin < IIW0 — rylll-|-a ak\ 

9 ^ 


(2.18) 


k=l 


Now, we need to re-write an estimator for ttq in terms of vector 9. For this purpose, replace c by 
(1 — 9 and solve equation (|2.15ll for obtaining 


vr, 


MLE 

0 


= zvq — 9 ^ u . 


Since vro > 0, we estimate vro by 


ttq = max(0, zvg — 9^u). 


(2.19) 


3 Implementation of the Lasso estimator 

Formulae (I2.18P and (|2.19ll suggest the following two-step optimization procedure. 
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Algorithm 


1 . 

2 . 

3. 


Evaluate sample frequencies vi, I = 0,1 ,given by formula (|2.8|) . 


Choose initial value vr 


( 0 ) 

0 


^0 


and obtain the vector of coefficients 6 


( 0 ) 


minimizing (|2.18p 


Find new value of vro using formula ()2.19p as tTq^ = max[0, i^o — 

new estimator 6 ^ of coefficients 0 by minimizing (j2.18l) . with tto = 
for j = 1,2,... until one of the following stopping criteria is met; 


u]. Then, obtain 
Repeat this step 


(i) = 0; (a) \\Wd^^'’ - < tol] (in) j > Jmax- 


Here tol and Jmax are, respectively, the tolerance level and the maximal number of steps 
defined in advance. 


4. Obtain the estimator 

p 

g(X) = 7fo(5(A) + ^ 0k4>kY)- (3.1) 

k=l 

Note that the algorithm described above is significantly simplified if 4>k{0) = 0 for all k = 
1,... ,p. Indeed, in this case, vector z = 0, so that vector 77 in (|2.17l) is independent of tto. In this 
case, one does not need iterative optimization. In particular, vector of coefficients 6 is recovered 
as solution of optimization problem (I2.18P and tto is constructed according to formula (|2.19p . In 
the present version of the paper, we considered this option. Indeed, in addition to computational 
convenience, choosing (j)k{0) = 0 for all fc = 1,... ,p, guarantees convergence of the Lasso estimator 

mi- 

In order to implement Lasso estimator, for any (^k, we need to obtain a solution V'fc.Cfc of 
equation (|2.9I) . For his purpose, we introduce a matrix version Q of operator Q in (II.ip . The 
elements of matrix Q are Poisson probabilities Qu = e“^*(xj)^/(/!), where Xi = ih, i = 1,2..., are 
the grid points at which we are going to recover g{X) and h is the step size. Introduce vectors (j)k 
and ^ = 1) ■ ■ ■ jP) with elements (f)k{xi), i = 1, 2..., and ipki^), I = 0,1, ..., respectively. Then, 

for each k = 1,... ,p, equation (|2.9I) can be re-written as 

= {QQY + (3.2) 

where I is the identity matrix. For the sake of finding (^k satisfying (j2.10l) . we created a grid 

and chose (^k so that to minimize an absolute value of \ai[ipkX^,(X)] — ~ isYk)'^ where 

yar[ipkXkO^)] the sample variance of ipkXkY")- After that, we evaluated 'i’kiX) in (j2.1ip and 
replaced unknown variances cr| in (| 2 . 1 ip by their sample counterparts. 


4 Convergence and estimation error 

Let g{X) be given by (j3.1|) . In order to derive oracle inequalities for the error of g{X), we introduce 
the following notations. Let 


p 

/(A) = (1 - 7ro)c/o(A), ft = 

i=i 


(4.1) 
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For any vector t G denote its and norms by, respectively, ||t|| 2 , ||t||i, ||t||o and 

||t||oo- Similarly, for any function /, denote by ||/|| 2 , ||/||i and ||/||oo its L^, and norms. 
Denote V = {!,■ ■ ■ ,p}. For any subset of indices J C P, subset is its complement in V and | J| 
is its cardinality, so that \V\ = p. Let Cj = Span{(/)j, j G J}. li J <ZV and t G M^, then tj G 
denotes reduction of vector t to subset of indices J. Let 6q be coefficients of the projection of 
/ = (1 — TTojg'o on the linear span of the dictionary i.e., = proj^^/. Let Jq = supp(0o)- 

Denote by Aniin(?7i) and AmaxC?^) the minimum and the maximum restricted eigenvalues of matrix 
$ 


Amin(m.) = 


Denote T = diag(iTi, • • • , Up), 


mm 

tGKP 

|t||o<7Tl 


| H -||2 ■ 


Amax(^) — 




max 

teRP 

|t||o<m 


V{pi,J) = {deW ■. ||(Td)jc||i <^||(Td)j||i}, ^>1, 


(4.2) 


(4.3) 


and consider the set Q{Ca-) of subsets J CV 


G{Ca) = < J CV : max — < Co 

' aj> 


(4.4) 


It turns out that, as long as the sample size n is large enough, estimator /g is close to / 
with high probability, with no additional assumptions. Indeed, the following statement holds. 

Theorem 1 Let (/)fc(0) = 0, /c = 1, • • • ,p, and n > Nq where 


16 

A^o = Tr('''+ 1) logp max 
9 


2 ^ 

oo 




Then, for any r > 0 and any a > ceo; with probability at least 1 — 2p , one has 

p 


Wfg-fWl < inf 


ll/t -/ll2 + 4a^fTj|tj 


where 6 is the solution of optimization problem 112.18\} and 

ao = (2v^(r + I)logp + 1) 


(4.5) 


(4.6) 


(4.7) 


Theorem [T] provides the, so called, slow Lasso rates. In order to obtain faster convergence 
rates and also to ensure that vro is close to ttq with high probability, we impose the following two 
conditions on the dictionary (fk, k = 1, • • • ,p, and the true function /. The first condition needs 
to ensure that the dictionary {4>j, j G V} is incoherent and it can be warranted by the following 
assumption introduced in [3]: 


(Al) For some s, 1 < s <pj2, some m > s such that s + m < p and some constant Cq one has 

m Amin (s + m) > Cq 5 Amn,x(m), (4.8) 

where Aniin(s + and are restricted eigenvalues of matrix $ defined in (14.2p . 
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Observe that, if J € G{Ca), condition d € J) implies that ||djc||i < ^CfflldjUi, so that 
Lemma 4.1. of Bickel et al. (2009) yields that 




min min 

Jeg(Ccy) deXl(M,J) 
|J|<S d :^0 


d^$d 

lidjili 


> Amin(s + m) 



f-^Ccr\/ 'SAniax(^) 
V^mAmin(s + rn) 


2 

> 0 


(4.9) 


provided Assumption (Al) holds with Cq = fiC^- 

As a second condition, we assume that the true function / in (14.111 is such that its “good” 
approximation can be achieved using J G G{Co-)- 


(A2) For some // > 0, Co- > 0 and some Hq > 0 one has 

iLologp 


J = arg min 
jcp 


~ fCj\\2 + 


E 


O'" 


'd{s,m,^,Ca) ^ n 


G Q{C,). 


(4.10) 


Note that Assumption A2 is natural and is similar to the usual assumptions that / is 
smooth and does not have fast oscillating components. In the context of the ill-posed problems. 
Assumption A2 means that / is not “too hard” to estimate. Under Assumptions Al and A2, one 
can prove the “fast” convergevce rates for / as well as obtain the error bounds for tto. 


Theorem 2 Let </>fc(0) = 0, k = 1, • • • ,p, n > Nq where Nq is defined in (|.^.5[ ). Let a = rocto 
where oq is defined in Let Assumptions Al and A2 hold with some p, and C^, |^| < s, 

Co = pCa and Hq > 2(1 -j-w)'^(4 t + 5), where w > {p + l)/(/U — 1). Then, with probability at least 
1 — 2p~'^, one has 


Wig - /lb < mf. 


11/ - fCj\\2 + 


4^0 log p 
'd{s,m,p,Ca 


E 


(4.11) 


Moreover, if Jo G Q{Ca) and |Jo| < s, then, with probability at least 1 — 4p ”, one has 

4iLo (1 + /-iCa)^ ||u||^ s logp 


'd‘^{s,m,p,Ca) 

where u is a vector with eomponents Uk = Cfc(O) defined in i2.13\} . 


T 

^ n ’ 
16 Jo 


(4.12) 


5 Numerical Simulations 

In order to evaluate the accuracy of the proposed estimator we carried out a simulation study 
where we tested performance of the proposed estimator under various scenarios. In order to assess 
precision of the estimator, for each of the scenarios, we evaluated the relative integrated error of g 
defined as 

= \\9 - gWl/Ml (5.1) 

where the norm was calculated over the grid xt = ih with i = 0,1,..., if vro = vro = 0 and 
i = 1,2,..., otherwise. In addition, we studied prediction properties of g. In particular, we 
constructed 

U = / 776 “^ 5(A)dA = 7roI(Z = 0 ) + (1 - 7ro)^4Cfc(0, 

•^0 k=i 
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where Uk{l) defined in equation (|2.13l) . and then evaluated the weighted £^-norni of the difference 
between the vectors u = (Dq, • • •,) and = {vq, ui,...) of, respectively, the predicted and the 
observed frequencies 

= \\u - u\\l/\\u\\l. (5.2) 

For the estimator proposed in this paper, we tested various computational schemes that differ by 
the strategies for selecting the penalty parameter a in expression (|2.18l) . In addition, we compared 
our estimator with the estimator of Comte and Genon-Catalot (2015). In particular, we considered 
the following techniques for choosing a. 

OPT : This estimator is obtained using algorithm presented in Section [3l Parameter a is 
optimally chosen by minimizing the difference US'— 5 II 2 between the true and estimated values 
of g. This estimator represents a benchmark for the proposed procedures but it is available 
only in simulation setting but not in practice. 

DD 12 ■ This estimator is obtained using algorithm presented in Section [3] where parameter 
a in (I2.18P is chosen by a data driven (DD) criterion. The general idea behind such kind of 
criterion is to measure, as a function of parameter a, the ability of the estimator to “fit” the 
observed data, and then to choose a maximizing such kind of measure. Since we use Ay as 
a measure of goodness of fit, estimator DD 12 uses the value of a that minimizes Ay. 

DDiike ■ This estimator is obtained using algorithm presented in Section [3] where parameter 
a is derived by maximizing the likelihood as suggested in [ 3 . In particular, one can check 
that the likelihood, given the data turns out to be of the form where 

M = max, Yi. 

NDE : This is the Nonparametric Density Estimator presented in Comte and Genon-Catalot 
(2015). The authors kindly provided the code. 

The set of test functions represents different situations inspired by the real data problem described 
in the next Section. In particular, we consider the following nine test functions: 

1. a gamma density g{X) = r(A; 3,1) 

2. a mixed gamma density g{\) = 0.3r(A; 3, 0.25) -|- 0.7r(A; 10, 0.6) 

3. an exponential density g{X) = r(A; 1, 2) 

4. a Weibull density g{\) = 9p~^x^~^ exp —{x/d)^lx>o, with p = 3 and 8 = 2 

5. a Gaussian density g{X) = N{X] 80,1) 

6. a mixed gamma density g{X) = 0.3r(A; 2, 0.3) -|- 0.7r(A; 40,1) 

7. a delta contaminated gamma density g{X) = 0.3<5(A) -|- 0.7r(A;40,1) 

8. a delta contaminated Gaussian density g{X) = 0.2<5(A) -|- 0.8A^(A; 80,8^) 

9. a delta contaminated Gaussian density g{X) = 0.2(i(A) -|- 0.8A^(A; 20,4^) 

The first four test functions have been analyzed in Gomte and Genon-Gatalot (2015) and 
represent cases where most of the data is concentrated near zero. The 5-th test function corresponds 
to the situation where most of the data is concentrated away from zero. The last four test functions 
represent the mixtures of the two previous scenarios. All nine densities are showed in Figure [H 
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Tables 1, 3 and 5 below display the average values of defined in (|5.ip while Tables 2, 4 
and 6 report A,y given by (|5.2I) (with the standard deviations in parentheses) over 100 different 
realizations of data Tj ~ Poisson{\i), i = 1, ..,n, where n = 10000 for Tables 1 and 2, n = 5000 
for Tables 3 and 4 and n = 1000 for Tables 5 and 6. The dictionary was constructed as a collection 
of the gamma pdfs m where parameters {ak,hk) belong to the Cartesian product of vectors 
a = [2,3,4, ••• ,150] and b = [0.1,0.15, •• • ,0.9,0.95], so that = 0. We chose the grid step 

h = 0.5. 

As it is expected, performances of all estimators deteriorate when n decreases, although not 
very significantly. For a hxed sample size, estimator OPT is the most precise in terms of Ag as 
a direct consequence of its definition, however, estimator DDuke is always comparable. Estimator 
DDi 2 has similar performance to DDuke except for cases 2 and 6 where the underlying densities 
are bimodal and, hence, data can be explained by a variety of density mixtures. In conclusion, 
apart from OPT which is not available in the case of real data, estimator DDuke turns out to be 
the most accurate in terms of both and A^/. For completeness, Figures [U and [2] exhibit some 
reconstructions obtained using estimator DDuke in case of n = 5000. 

Finally, we should mention that NDE is a projection estimator that uses only the first 
few Laguerre functions. For this reason, it fails to adequately represent a density function that 
corresponds to the situation where values Aj, i = 1, • • • , n, are concentrated away from zero, as it 
happens in case 5 (where NDE returns zero as an estimator) and case 6 (where NDE succeeds in 
reconstructing only the hrst part of the density near zero). Also, note that NDE errors are not 
displayed for cases 7, 8 and 9 since this estimator is not dehned for delta contaminated densities. 

6 Application to evaluation of the density of the Saturn ring 

The Saturn’s rings system can be broadly grouped into two categories: dense rings (A, B, C) and 
tenuous rings (D, E, G) (see the hrst panel of Figure [3]). The Cassini Division is a ring region 
that separates the A and B rings. The study of structure within Saturn’s rings originated with 
Campani, who observed in 1664 that the inner half of the disk was brighter than the outer half. 
Furthermore, in 1859, Maxwell proved that the rings could not be solid or liquid but were instead 
made up of an indehnite number of particles of various sizes, each on its own orbit about Saturn. 
Detailed ring structure was revealed for the hrst time, however, by the 1979 Pioneer and 1980- 
1981 Voyager encounters with Saturn. Images were taken at close range, by stellar occultation 
(observing the bickering of a star as it passes behind the rings), and by radio occultation (measuring 
the attenuation of the spacecraft’s radio signal as it passes behind the rings as seen from Earth) 
(see, e.g., Esposito et al, 2004). By analyzing the intensity of star light while it is passing through 
Saturn’s rings, astronomers can gain insight into properties telescopes cannot visually determine. 
Each sub-region in the rings has its own associated distinct distribution of of the density and sizes 
of the particles constituting the sub-region. This distribution uniquely determines the amount of 
light which is able to pass from a star (behind the rings) to the photometer. 

Our data V, i = 1, - ■ ■ , n, come from sets of observations of stellar occultations recorded by 
the Cassini UVIS high speed photometer and contains n = 7 615 754 photon counts at different 
radial points, located at 0.01-0.1 kilometer increments, on the Saturn’s rings plane (see the second 
panel of Figure [3]). With no outside inhuences, these photon counts should follow the Poisson 
distribution, however, obstructions imposed by the particles in the rings cause their distribution 
to deviate from Poisson. Indeed, if data were Poisson distributed, then its mean would be ap¬ 
proximately equal to its variance for every sub-region. However, as the third panel of Figure [3] 
shows, observations 1) have signihcantly higher variances than means. The latter is due to the 
fact that, although for each i = 1, - ■ ■ ,n, the photon counts V are Poisson{\i), the values of Aj, 
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i = 1, • • • ,n, are extremely varied and, specifically, cannot be modeled as values of a continuous 
function. In fact, intensities Aj, i = I,-- - ,n, are best described as random variables with an 
unknown underlying pdf g{\). In addition, if the ring region contains a significant proportion of 
large particle, those particles can completely block of the light leading to zero photon counts. For 
this reason, we allow g{\) to possibly contain a non-zero mass at A = 0, hence, being of the form 
(na). The shape of g{\) allows one to determine the density and distribution of the sizes of the 
particles of a respective sub-region of the Saturn rings. This information, in turn, should shed light 
on the question of the origin of the rings as well as how they reached their current configuration. 

In order to identify sub-regions of the Saturn rings with distinct properties, we segmented the 
data using a method presented in [5] which is designed for partitioning of complicated signals with 
several non-isolated and oscillating singularities. In particular, we applied the Gabor Continuous 
Wavelet Transform (see, e.g. [2T]) to the data and selected the highest scale where the number 
of wavelet modulus maxima takes minimum value. At this scale, we segmented the signal by the 
method proposed in [5]. We obtained a total of 1531 intervals of different sizes. Figures [H and 
[5] refer to six distinct sub-regions of the rings. The left panels of both figures show raw data. 
The right panels exhibit the sample and the estimated frequencies, with the penalty parameter 
obtained by DDm^g criterion, for six different intervals that are representative of different portions 
of the data set. 

Note that in Figure 01 for all three data segments, the estimated parameter vfo = 0. This 
is not true for the first and the second panels of Figure [5] where ttq = 0.5059 and vfo = 0.2463, 
respectively. The values of A^, defined in (|5.2p . obtained for the six data segments are, respectively, 
0.0128, 0.0159, 0.0022, 0.0229, 0.003 and 0.0095, and are consistent with the values obtained in 
simulations. Both, the right panels in Figures 0] and [5] and the values of A^/, confirm the ability of 
the estimator developed in the paper to accurately explain the Saturn’s rings data. 
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7 Proofs 

Proofs of Theorems [T] and [2] are based on the following statement which is the trivial modification 
of Lemma 3 of Pensky (2015). 

Lemma 1 (Pensky (2015)). Let f be the true function and Jq he its projection onto the linear 
span of the dictionary Cp. Let T be a diagonal matrix with components aj, j = 1, • • • ,p. Consider 
solution of the weighted Lasso problem 

6 = arg min |t^WW^t — 2t^[3 + a||Tt||i| . (7.1) 

with $ = W^W, (3 = ^6 and 

]3 = p + V^Trj + h, r],heMP, (7.2) 
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where h. is a nonrandom vector, Et 7 = 0 and components rjj of vector rj are random variables such 
that, for some K > 0 and any r > 0, there is a set 

= < a; ; max \rjj\ < ii'-^/(TM^TyTo^ i with P(0) > 1 — 2p~'^. (7.3) 

[ i<i<p J 

\hj\ 


Denote 


Ch = max 


Ca — K V'T" + 1 + Ch. 


(7.4) 


l<i<P \_(Tj y/e \ogp 

If 0(0 = CaVologp, then for any r > 0 and any a > ao, then, with probability at least 1 — 2p~'^, 
one has 

||/g-/||i<inf[||/t-/||2 + 4a||Tt||i]. (7.5) 

Moreover, if matrices $ and T are such that for some p, > 1 and any J <ZV 


K^{p, J) = min s d G D{p, J), ||d ||2 / 0 : 


d^$d • rr(T^ 

ll(Td)j||f ^ 


> 0 , 


(7.6) 


and a = wa^ where w > {p + l)/{p — 1), then for any r > 0 with probability at least 1 — 2p , 
one has 


il/§-/lii<.,w 


(1 + 


ll/t-/lli + 4a||(Tt)jc||i + 


:logp ^ 


(J,- 




(7.7) 


Proof of Theorem [H Let vectors b and respectively, have components bk = {4>ki f) 
and defined in (I2.16|) . It is easy to see that 


1 

fk - bk =-'Y^[ipk{Yi)-Ef^kiYi)] + Hk with Hk = - bk (7.8) 

i=l 

Applying Bernstein inequality, for any x > 0, obtain 


2=1 


xak] 2 

> —^ I < 2 exp —X 


n 


2 + 


‘ixakW'ipkW C 


3i/n 


-1' 


Using the fact that Af{B + C) > mm{A/{2B),A/{2C)) for any A,B,C > 0, under condition 
n > Nq, derive 




2=1 


> xn < 2exp —(x^/4). 


(7.9) 


Choosing x = 2^J (r + 1) logp and recalling that, according to (12.1011 . \Hk\ = n gather that 

lP(|?fc - bk\ > cf^[2^{T + l)logp + 1]) < p~Y^Y^ so that 


u : max 

\Ck - bk\ 

1 



< 2V(r + l)_logp+l l P(S1,) > 1 _ 2J,-, (7.10) 


n 


Then, validity of Theorem [T] follows directly from Lemma [T] with pk = f^kjoik and K = 2. 
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Proof of Theorem [2l Validity of inequality (14.111) follows from (|7.10p and Lemma 1 with 
K = 2. 

In order to establish upper bounds for (tto — note that P(y = 0) = tto + so that 


I I'o — P(V = 0) — d^u, if uq — 0 u > 0, 
"^0 ~ ^0 = \ 

y —TTo, if vq — 0 u < 0, 


(7.11) 


where d = 0 — 6 q. Then, by standard arguments (see, e.g. Dalalyan et al. (2014)), one has 

< d^(^ - b) + a(||T0o||i - ||T0||i) 

For w G fli where fli is defined in (j7.inp . one has 

+ (a - ao)||(Td)jc||i < (a + ao)||(Td)jJ|i. (7.12) 

Therefore, d G P(/i, Jq) where 'D{fi,J) is defined in (14.3|) . and due to Jq G G{Ca), the following 
inequality holds: 

||d,7g||i < 7tC'cr||djo||i. (7.13) 

Hence, by (14.9p . d^$d > ^,Ca). On the other hand, inequality (j7.12p yields d^$d < 

(a + ao)||djo|| 2 '^Tr(T^J, so that 


Idjolb < 


[a + oq ) 




(7.14) 


Using (I7.13P and (|7.14p . obtain that, for any oj G Hi, 

|d'^u| < ||djJ|i||ujJ|oo + ||djc||i||ujc||oo < ||djJ|i||u||oo(l + liOo-) 

< v^||djJ| 2 ||u||oo(l + ^iCa) < y/sao (1 + tu) ||u||oo(l +/iOc.) Tr(T\) f'&ys, m, C^) ■ 

In addition, there exists a set H 2 such that, for w G H 2 , one has \vq — P(y = 0)| < a/( rlogp)/n 
and P(H 2 ) > 1 — 2p~'^. Let H = Hi n H 2 . Then, P(H) > I — Ap~'^ and, for a; G H, 

y/s ao (1 + zu) \\u\\ooil + pCa) Jrlogp 

|z^o-P(U = 0)-d^u| < ---4-—-- + ^ (7.15) 


'&{s,m,p,Ca 


n 


Inequality (|7.15l) provides an upper bound for the error if uq — O u > 0. If i/q ~ ^ u < 0, note that 

0 < vTo = (P(y = 0) — 1^0 — d'^u) + (I'Q — 6 u) < P(y = 0) — t'o — d'^u, 

and again apply (I7.15h . Finally, plugging in the value of ao and using inequality for Hq, derive 
that, for a; G H, inequality p4.12p holds. 
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Table 1: Average values of (with their standard deviations in parentheses) over 100 simulation 
runs with n = 10000 


test case 

OPT 

DDi2 

^^like 

NDE 

easel 

0.0007 (0.0010) 

0.0023 (0.0028) 

0.0022 (0.0030) 

0.1183 (0.8307) 

case2 

0.0471 (0.0197) 

0.2214 (0.0503) 

0.0507 (0.0305) 

0.0613 (0.0716) 

caseS 

0.0142 (0.0398) 

0.0191 (0.0127) 

0.0138 (0.0087) 

0.0190 (0.0399) 

case4 

0.0043 (0.0021) 

0.0054 (0.0032) 

0.0061 (0.0052) 

0.0298 (0.0657) 

case5 

0.0042 (0.0033) 

0.0023 (0.0029) 

0.0014 (0.0021) 

1.0000 (0.0000) 

case6 

0.0793 (0.0247) 

0.4318 (0.0554) 

0.0839 (0.0241) 

0.3383 (0.0085) 

case7 

0.0067 (0.0012) 

0.0009 (0.0008) 

0.0008 (0.0008) 

- 

caseS 

0.0060 (0.0010) 

0.0068 (0.0009) 

0.0069 (0.0010) 

- 

case9 

0.0085 (0.0013) 

0.0099 (0.0014) 

0.0111 (0.0026) 

- 


Table 2: Average values of (with their standard deviations in parentheses) over 100 simulation 
runs with n = 10000 


test case 

OPT 

DDi2 

^^like 

NDE 

easel 

0.0011 (0.0009) 

0.0006 (0.0005) 

0.0007 (0.0005) 

0.0675 (0.5470) 

case2 

0.0013 (0.0007) 

0.0088 (0.0024) 

0.0014 (0.0010) 

0.0228 (0.0431) 

caseS 

0.0002 (0.0001) 

0.0006 (0.0005) 

0.0003 (0.0003) 

0.1130 (0.0653) 

case4 

0.0014 (0.0009) 

0.0011 (0.0006) 

0.0012 (0.0009) 

0.0205 (0.0530) 

case5 

0.0045 (0.0010) 

0.0043 (0.0010) 

0.0044 (0.0009) 

1.0000 (0.0000) 

case6 

0.0465 (0.1402) 

0.0288 (0.0045) 

0.0041 (0.0020) 

0.4376 (0.0618) 

case? 

0.0013 (0.0002) 

0.0006 (0.0002) 

0.0006 (0.0002) 

- 

caseS 

0.0039 (0.0009) 

0.0018 (0.0004) 

0.0018 (0.0004) 

- 

case9 

0.0035 (0.0006) 

0.0019 (0.0007) 

0.0020 (0.0006) 

- 


Table 3: Average values of A^ (with their standard deviations in parentheses) over 100 simulation 
runs with n = 5000 


test case 

OPT 

DDi2 

DDlike 

NDE 

easel 

0.0006 (0.0008) 

0.0038 (0.0049) 

0.0030 (0.0046) 

0.2424 (1.5002) 

case2 

0.0590 (0.0343) 

0.2106 (0.0549) 

0.0640 (0.0428) 

0.2048 (0.2196) 

caseS 

0.0148 (0.0097) 

0.0251 (0.0310) 

0.0178 (0.0119) 

0.0309 (0.0650) 

case4 

0.0055 (0.0019) 

0.0074 (0.0051) 

0.0086 (0.0060) 

0.0493 (0.1123) 

case5 

0.0069 (0.0052) 

0.0044 (0.0048) 

0.0024 (0.0036) 

1.0000 (0.0000) 

case6 

0.0830 (0.0277) 

0.4068 (0.0856) 

0.0879 (0.0266) 

0.3456 (0.0110) 

caseT 

0.0077 (0.0035) 

0.0023 (0.0030) 

0.0023 (0.0030) 

- 

caseS 

0.0065 (0.0021) 

0.0074 (0.0020) 

0.0075 (0.0021) 

- 

case9 

0.0096 (0.0023) 

0.0114 (0.0023) 

0.0128 (0.0029) 

- 
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Table 4: Average values of (with their standard deviations in parentheses) over 100 simulation 
runs with n = 5000 


test case 

OPT 

DDi^2 


NDE 

easel 

0.0017 

(0.0016) 

0.0010 

(0.0007) 

0.0012 

(0.0007) 

0.1393 

(1.0084) 

case2 

0.0020 

(0.0013) 

0.0090 

(0.0029) 

0.0022 

(0.0014) 

0.1184 

(0.1475) 

caseS 

0.0004 

(0.0002) 

0.0009 

(0.0013) 

0.0006 

(0.0004) 

0.1131 

(0.0948) 

case4 

0.0022 

(0.0014) 

0.0016 

(0.0008) 

0.0018 

(0.0013) 

0.0346 

(0.0859) 

case5 

0.0087 

(0.0019) 

0.0084 

(0.0018) 

0.0085 

(0.0018) 

1.0000 

(0.0000) 

case6 

0.0377 

(0.1361) 

0.0285 

(0.0068) 

0.0057 

(0.0030) 

0.4608 

(0.0849) 

case7 

0.0020 

(0.0003) 

0.0013 

(0.0003) 

0.0013 

(0.0003) 


- 

caseS 

0.0052 

(0.0011) 

0.0032 

(0.0008) 

0.0032 

(0.0008) 


- 

case9 

0.0044 

(0.0010) 

0.0031 

(0.0010) 

0.0031 

(0.0009) 


- 


Table 5: Average values of (with their standard deviations in parentheses) over 100 simulation 
runs with n = 1000 


test case 

OPT 

DDi2 

^^like 

NDE 

easel 

0.0040 

(0.0097) 

0.0221 

(0.0258) 

0.0176 

(0.0207) 

0.3004 

(0.9331) 

case2 

0.0992 

(0.0760) 

0.1973 

(0.0718) 

0.1335 

(0.0983) 

0.5370 

(0.0960) 

caseS 

0.0533 

(0.0889) 

0.0753 

(0.0838) 

0.0662 

(0.0894) 

0.1127 

(0.3912) 

case4 

0.0069 

(0.0014) 

0.0178 

(0.0183) 

0.0179 

(0.0135) 

0.1393 

(0.3381) 

case5 

0.0170 

(0.0108) 

0.0217 

(0.0253) 

0.0152 

(0.0223) 

1.0000 

(0.0000) 

case6 

0.1223 

(0.0710) 

0.3151 

(0.1409) 

0.1270 

(0.0759) 

0.4479 

(0.2572) 

case7 

0.0133 

(0.0115) 

0.0102 

(0.0118) 

0.0098 

(0.0118) 


- 

caseS 

0.0142 

(0.0137) 

0.0156 

(0.0139) 

0.0156 

(0.0154) 


- 

case9 

0.0121 

(0.0073) 

0.0163 

(0.0110) 

0.0160 

(0.0101) 


- 


Table 6: Average values of Aj^ (with their standard deviations in parentheses) over 100 simulation 
runs with n = 1000 


test case 

OPT 

DDi2 

D^like 

NDE 

easel 

0.0063 

(0.0042) 

0.0043 

(0.0026) 

0.0047 

(0.0027) 

0.1458 

(0.5377) 

case2 

0.0076 

(0.0051) 

0.0117 

(0.0047) 

0.0084 

(0.0048) 

0.3498 

(0.0937) 

caseS 

0.0021 

(0.0022) 

0.0031 

(0.0033) 

0.0027 

(0.0037) 

0.2149 

(0.3773) 

case4 

0.0075 

(0.0068) 

0.0046 

(0.0029) 

0.0048 

(0.0032) 

0.0967 

(0.3578) 

case5 

0.0427 

(0.0091) 

0.0411 

(0.0090) 

0.0416 

(0.0090) 

1.0000 

(0.0000) 

case6 

0.0157 

(0.0044) 

0.0304 

(0.0127) 

0.0166 

(0.0067) 

0.5344 

(0.1001) 

case7 

0.0072 

(0.0018) 

0.0067 

(0.0018) 

0.0067 

(0.0018) 


- 

caseS 

0.0154 

(0.0038) 

0.0143 

(0.0038) 

0.0142 

(0.0037) 


- 

case9 

0.0120 

(0.0034) 

0.0111 

(0.0032) 

0.0111 

(0.0032) 


- 
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Figure 1: The true density (red) and DDm^g estimators (blue) obtained in the first 10 simulation 
runs with sample size n = 5000 
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Figure 2: Sample frequencies (red) and estimated frequencies (blue) obtained in the first 10 simu¬ 
lation runs with sample size n = 5000 
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Figure 3: The first panel: Names of Saturns rings, courtesy science.nasa.gov. The second panel: 
the means of the binned total data set (100 observations per bin). The third panel: the means 
(red) and the variances (blue) of the binned data 
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Figure 4: Left panels: segments of data. Right panels: sample frequencies (blue) and estimated 
frequencies with the penalty parameter obtained by DDuke criterion (red). = 0.0128 (top 
panel), = 0.0159 (middle panel), /S.y = 0.0022, tto = 0 for all three cases, (bottom panel) 



data 1492993 : 1498291 
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Figure 5: Left panels: segments of data. Right panels: sample frequencies (blue) and estimated 
frequencies with the penalty parameter obtained by DDuke criterion (red). A^/ = 0.0229 and 
TTo = 0.5059 (top panel), Ajy = 0.003 and vro = 0.2463 (middle panel), = 0.0095 and tto = 0 
(bottom panel) 


data 22S4952 ; 231 01 -45 



X 1 


sarriplad and astimatad fratqLjancias 
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